A novel research method to measure the usage of web-based information

This research method contributes to the literature by measuring commensurably ‘the usage of information for (or web traffic on) web-based research studies’. The introduced method deepens the understanding the functionality of online media by focusing on specific web metrics to make the usage of this type of media efficient for disclosure function. The inputs of new method are originally based on publicly available data, and it can be applied through 3 consecutive steps. Accordingly, this method is applicable to assess stakeholders’ engagement for any web-based research study. In sum, this method presents:• The inputs of the method are publicly available data.• This method is applicable for variety web-based research studies regardless of the applied methodological approach (e.g., qualitative, quantitative).


Introduction
This paper explores the assessment of web disclosure by determining and analyzing stakeholders' usage of (or web traffic on) online CSR information (published on web 1.0 websites) as an indicator of their engagement. Web traffic is the amount of data received or sent by visitors, mostly stakeholders [4] . Their traffic [despite its supposed ambiguous identification] is the motivation of most websites' reconstructions and improvements [9] . In this context, this paper has experienced two technical challenges: first, introducing an applicable tool/model to estimate the usage of (or web traffic on) information. Secondly, if applicable, identifying the probable influence of the ambiguity that might bias the outputs of the introduced method due to the non -stakeholder usage of CSR information. It is worth stating that the information of the real web traffic statistics is protected by many conventions, laws, and regulations that restrict personal data usage and information registration 1 . In other words, the real web metrics of websites are not publicly available.

Literature review
The literature of estimating web traffic is based on retrieving data from popular directories, such as Alexa [14] , Compete [17] , or SimilarWeb [14] . In other words, many popular websites (web-directories) provide their users with such data. However, some provide it only for website owners (e.g., Google Analytics) or provide chargeable web traffic data for home pages only (e.g., Alexa, SimilarWeb), or make the data available only for global companies (e.g., Compete). Archive.org is the only public data source that freely serves 2 its users with data equivalents , as I argue in this paper, to web traffic data related to millions of websites worldwide, including those of all the oil companies in the research sample. Archive.org has never been used, as shown in the research method, for this purpose. From a web metrics perspective, the extent of stakeholders' engagement might be associated with time spent to navigate the website or specific web pages [6] , the number of visits [8] , the number of page views per visitor [2] , the number of returning visitors [12] , or the number of links on the website [18] . However, no research studies stakeholders' engagement using real/estimated web traffic that this paper is uniquely seeking to estimate.
As mentioned before, the second technical challenge in this research study is identifying the probable influence of the analytic ambiguity, which might bias the introduced method's outputs due to the nested non -stakeholder usage of CSR information. The visitors of websites, in general, can be considered as firms' stakeholders despite websites being public and free to access. The validity of this assumption is based on first prioritising the potential interests of targeted stakeholders on the reconstructed websites [and their web contents] [5] . Secondly, implementing web traffic by attracting irrelevant stakeholders, such as adding irrelevant content to increase website ranking via tagging, is an unsustainable tactic [7] . Accordingly, significant visitors, which are mostly firms' stakeholders, are neither recognised nor attracted to the websites randomly. Non-relevant visitors (non-stakeholders) are unlikely to navigate beyond the home page, where most of the main gates of web hierarchy are located, and CSR information is limitedly published [10] .
The website's popularity might lead to a significant change in the structure of relevant users [stakeholders] in terms of the usage of information. Web visitors from search engines, where websites likely occur in random searches, are two-thirds of the total of all visitors and have the longest session span compared to direct visitors [11] . Ortega and Aguillo (2010) found that search-engine visitors are more relevant to websites than direct visitors, presumably visiting websites on purpose. Accordingly, the irrelevant visitors [ non-stakeholders ] are mostly neither direct nor search-engine visitors who consist of mostly web visitors (Ortega & Aguillo, 2010;Pakkala et al., 2012). On the contrary, Plaza [13] found that both direct and search-engine visitors are, respectively, the most significant users [stakeholders] of web information from a visit-length perspective. Perhaps, this is because the studied website is related to scholars' [local] community in Bilboa City 3 . In terms of penetration-type visitors, this example could be close to the firms chosen in the research sample of this paper in that they are domestic oil companies. Therefore, it can be observed that direct and search-engine visitors are generally most websites' visitors (stakeholders). The identities of important stakeholders are based on whether, firstly, the content of websites targets global stakeholders. Consequently, search-engine visitors are the most significant stakeholders without dragging away the significance of direct-visiting stakeholders who obviously do not visit websites randomly. Secondly, if the content of websites targets domestic stakeholders, then both direct and search-engine visitors are the significant stakeholders.

Detecting the traffic on web disclosure
This section demonstrates a unique contribution to the web-based literature in general. CSR-related information shall be the first application of measuring the stakeholders' engagement by determining their usage of specific information using a publicly available resource of data. To determine the extent to which obtaining such information is difficult, Zotano et al. [20] developed their findings of 'mass media websites' using the web metrics of just one website of a popular TV channel in Spain. Moreover, this data is related to the main domain (home page) and does not reflect the web traffic on the other web pages branched from the home page. In general, most sources of real web traffic data that shall be used in the verification test in the next section cover a very limited time span for a few firms.
Obtaining detailed web traffic statistics of CSR information published on 13 bilingual websiteswithin eight years-owned by oil companies working in a context where many researchers have experienced serious data-collection difficulties [1] could be extremely challenging. Accordingly, a search for alternative (and applicably used) data sources of web traffic statistics is essentially needed. As illustrated before, Archive.org is used uniquely for this purpose. The official 'terms of use' 4 web page of Archive.org states in the third paragraph that '… In using the Archive's site, Collections, and/or services, you further agree (a) not to violate anyone's rights of privacy .' Accordingly, the statistical 'Collections', the 'number of captures' (or snapshots) of the studied websites during the research time horizon, have been consistently used in this paper for academic purposes. Consequently, all analytic methods, techniques, and collected data of this paper are either introduced or utilised to generally contribute to the academic literature and support the studied context in particular. However, the ability to use the 'number of captures' 5 as an indicator of 'web traffic' has raised a challenging assumption: ' whether the number of captures accurately measures web traffic or web popularity .' According to Archive.org, their mechanism for capturing websites states, ' Internet Archive's crawls tend to find sites that are well linked from other sites' 6 . This is called an 'in-links' approach to estimate site traffic created originally by the stakeholders themselves [3] . To confirm the case of data retrieved from Archive.org, analysing real website metrics could be the most appropriate approach to verify the associability between the numbers of captures (snapshots) of webpages, which have been presented on Archive.org during a period of time, and its real web traffic during the same period. I have obtained real web traffic data from four different sources, as shown in Table 1 .
According to the results of these four verification tests, the data of the 'number of captures' provided by Archive.org can be accepted as 'web traffic' data. Accordingly, all data records of CSR disclosure must be checked 7 , and the number of captures in each data record must be collected, as presented in Appendix 1. The unit of 'web traffic' shall be recognised as the Statistical Incidence of Stakeholders' Usage of Online Data (SISUOD).
The required steps to collect data, as illustrated in Fig. 1 , start with displaying a report provided by Archive.org about the 'summary of captures' of the concerned web page throughout its lifetime.   It is worth noting that the total number of SISUODs for this webpage is five, as shown in Fig. 1 . This method of generating web traffic data can be used for any website (or web page) since Archive.org was established in 1996. However, I have noticed that the mechanism of snapshotting websites has become regular since the summer of 2004, as shown in Fig. 2 . That moment of mature performance of Archive.org is indicated by the vertical dash line in Fig. 2 . Before 2004, the SISOUDs of any website are unlikely to have been significantly associated with its real web traffic data. This is why I have collected SISOUDs every week using Monach University's Archive.org of 2005, as the real data of web traffic is for 2002.

An empirical application
As mentioned before, the introduced method has been applied to CSR information published on the websites of 13 oil companies operating in Libya. The research time horizon covers eight years from 2008 to 2015. According to the collected data, not all web pages are launched exclusively for CSR information. The web traffic on this type of web pages is likely to be scored for various web contents rather than CSR information itself. Furthermore, there is no significant difference in the visual intentionality of displaying textual contents (e.g., coloured or flashing fonts, unique background colour) on any web page. All textual contents have been formatted similarly. Accordingly, the web traffic on CSR information (in each data record) should be adjusted by the difference between its textual size and the whole textual capacity of the web page (vehicle) at the date and time of  publishing the CSR disclosure. I determined the (adjusted) web traffic of CSR information apart from non-CSR information, which both share the same web pages in different proportions. The final tabulation of the adjusted web traffic (the outputs of the introduced method) of all data records is presented in Table 2 . It shows the estimated web traffic on five different types of CSR information found on the websites of oil companies of different types of ownerships.

Conclusion
The adopted approach in the research study could be considered as the embryonic stage of quantifying the concept of 'stakeholders' engagement', especially on firms' websites that are not supported by public 'advocacy advertising' [16] features to understand, qualitatively, stakeholders' opinions and perceptions about online CSR contents.
This novel method has been established by re-defining statistical data at publicly available sources. It would be feasibly very helpful for researchers studying the stakeholders' usage of web content in the CSR area or any other web-based research study. In other words, it could be the threshold of digitalizing stakeholders' engagement with web-based content.
The adopted method in this paper of using web metrics to study stakeholders' engagement and accessibility might guide firms to deepen their perceptions about stakeholders' expectations and needs. Moreover, it provides them with updated indicators of stakeholders' agreement about their expectations and needs. For example, the specification and prioritization of web-based CSR agendas can be longitudinally assessed using this method, mathematically determining the firm's awareness regarding the significance of web-based CSR hierarchy compared to the whole size of the website. A comprehensive picture of the institutional awareness of using such media should be drawn up by a dimensional analysis of the web-based CSR hierarchy, for example, how web CSR information is close to (or far from) the home page. This is applicable not only on the firms' side as they are CSR communicators but also interprets the trends of stakeholders' usage of web-based CSR content.

Ethics statements
MethodsX has ethical guidelines that all authors must comply with. In addition, we ask you to complete the relevant statement(s) below. Please delete those which are not relevant to your work.
If your work involved human subjects, please include a statement here confirming that the relevant informed consent was obtained from those subjects: If your work involved animal experiments, please include a statement here confirming that those experiments complied with the 13:italic ARRIVE guidelines /13:italic and were carried out in accordance with the U.K. Animals (Scientific Procedures) Act, 1986 and associated guidelines; 13:italic EU Directive 2010/63/EU for animal experiments /13:italic ; or the National Institutes of Health guide for the care and use of laboratory animals (NIH Publications No. 8023, revised 1978). Note, the sex of the animals must be indicated, and, where appropriate, the influence (or association) of sex on the results of the study: If your work involved data collected from social media platforms , please include a statement here confirming that a) informed consent was obtained from participants or that participant data has been fully anonymized, and b) the platform(s)' data redistribution policies were complied with:

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data Availability
Data will be made available on request.